DPG: A Cache-Efficient Accelerator for Sorting and for Join Operators

نویسندگان

  • Gene Cooperman
  • Xiaoqin Ma
  • Viet Ha Nguyen
چکیده

Sorting and join are the heart of most database operations. For main memory databases, the two operations are memory bounded not CPU. Especially the common step in both of them, record retrieval, causes a lot of random memory accesses. We present a new algorithm for fast record retrieval, distributeprobe-gather, or DPG. DPG is a cache conscious two pass algorithm for the main memory record retrieval problem. DPG has important applications both in sorting and in joins. Current main memory sorting algorithms split their work into three phases: extraction of key-pointer pairs; sorting of the key-pointer pairs; and copying of the original records into the destination array according the sorted key-pointer pairs. The last phase is essentially record retrieval and it dominates today’s sorting time. Hence, the use of DPG in the third phase accelerates the speeds of the existing sorting algorithms. DPG also provides two new join methods for foreign key joins, DPG-move join and DPG-sort join. DPG algorithm is applied twice in the new join methods to overcome the memory bottleneck: batch lookup in B+ tree indexes to construct the join triples and join two relations with join triples. The resulting join methods with DPG are faster than other join methods that do not use the index on the foreign key, because DPG join is cache efficient and at the same time DPG join eliminates the time for sorting or for hashing. The ideas presented for foreign key join can also be extended to faster record pair retrieval for spatial and temporal databases. According to our experimental results, when DPG is applied: the existing sorting algorithms can be improved 30% and our new DPG join algorithms is up to 2 times faster than the existing join algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relational Approach to Logical Query Optimization of XPath

To be able to handle the ever growing volumes of XML documents, effective and efficient data management solutions are needed. Managing XML data in a relational DBMS has great potential. Recently, effective relational storage schemes and index structures have been proposed as well as special-purpose join operators to speed up querying of XML data using XPath/XQuery. In this paper, we address the...

متن کامل

Parallel Sort-merge-join Reasoning

We present an in-memory, cross-platform, parallel reasoner for RDFS and RDFSPlus . Inferray uses carefully optimized hash-based join and sorting algorithms to perform parallel materialization. Designed to take advantage of the architecture of modern CPUs, Inferray exhibits a very good uses of cache and memory bandwidth. It offers state-of-theart performance on RDFS materialization, outperforms ...

متن کامل

A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations using Graphics Processors

We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth overhead to the video memory on the GPUs and reduce the memory bandwidth requirements. We also pr...

متن کامل

Efficient Evaluation of the Valid-Time Natural Join

Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the optimization of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Seco...

متن کامل

Distributed Join Algorithms on Thousands of Cores

Traditional database operators such as joins are relevant not only in the context of database engines but also as a building block in many computational and machine learning algorithms. With the advent of big data, there is an increasing demand for efficient join algorithms that can scale with the input data size and the available hardware resources. In this paper, we explore the implementation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.DB/0308004  شماره 

صفحات  -

تاریخ انتشار 2003